Python Memory Optimisation

The 288-Byte Empty Dictionary

Run this in Python:

import sys

d = {}
print(f"sys.getsizeof({{}}) = {sys.getsizeof(d)} bytes")

sys.getsizeof({}) = 232 bytes

An empty Python dictionary consumes 232 bytes. But sys.getsizeof measures only the dict object itself - it does not follow references. The dict lives inside some container that holds a reference to it (another 8 bytes for the pointer). The Python runtime itself carries per-object overhead. In practice, an empty dict "costs" your process approximately 232 bytes of heap memory before storing a single key-value pair.

Now scale that:

# A cache of 10 million empty dicts
cache = [{} for _ in range(10_000_000)]

import tracemalloc
tracemalloc.start()
cache = [{} for _ in range(10_000_000)]
snapshot = tracemalloc.take_snapshot()
stats = snapshot.statistics('lineno')
print(f"Top allocation: {stats[0].size / 1024 / 1024:.1f} MB")

Top allocation: 2,632.4 MB   (~264 bytes × 10M)

2.6 GB of RAM - and you have not stored a single value yet. This is Python's object overhead in action. Production systems that cache millions of small objects routinely hit OOM errors from overhead alone.

This lesson shows how to measure that overhead and reduce it by 3–10x.

What You Will Learn

Understand Python's object memory model and where the overhead comes from
Use __slots__ to eliminate per-instance __dict__ overhead
Use generators instead of lists to process large sequences without materialising them
Use array.array and NumPy structured arrays instead of Python lists for homogeneous data
Use mmap to work with files too large to fit in RAM
Tune the garbage collector to reduce GC pauses in latency-sensitive services
Use weakref to build caches that do not prevent garbage collection
Use objgraph to find memory leaks in production

Prerequisites

Requirement	Level Needed
Python classes and instances	Comfortable
Python generators and iterators	Familiar
Context managers (`with`)	Comfortable
Basic OS concepts (virtual memory)	Helpful

Section 1: Python Object Memory Model

Every Python object carries overhead before any user data is stored. Understanding this model tells you exactly which optimisations are available.

The PyObject Header

All Python objects begin with the same C structure:

/* CPython source - simplified */
typedef struct _object {
    Py_ssize_t ob_refcnt;    /* 8 bytes: reference count */
    PyTypeObject *ob_type;   /* 8 bytes: pointer to type */
} PyObject;

This 16-byte header exists on every Python object, regardless of its content. On top of this base, each type adds its own fields.

Measuring Real Object Sizes

import sys

# Primitive types
print(f"int(0):    {sys.getsizeof(0)} bytes")       # 28 bytes
print(f"int(2**60):{sys.getsizeof(2**60)} bytes")   # 32 bytes (larger int)
print(f"float:     {sys.getsizeof(0.0)} bytes")     # 24 bytes
print(f"bool:      {sys.getsizeof(True)} bytes")    # 28 bytes
print(f"str '':    {sys.getsizeof('')} bytes")      # 49 bytes
print(f"str 'a':   {sys.getsizeof('a')} bytes")     # 50 bytes
print(f"None:      {sys.getsizeof(None)} bytes")    # 16 bytes

# Containers (empty)
print(f"list []:   {sys.getsizeof([])} bytes")      # 56 bytes
print(f"tuple ():  {sys.getsizeof(())} bytes")      # 40 bytes
print(f"dict {{}}:  {sys.getsizeof({})} bytes")      # 232 bytes
print(f"set set(): {sys.getsizeof(set())} bytes")   # 216 bytes

Why a Python `int` Is 28 Bytes

A C int64 is 8 bytes. A Python int with the same value is 28 bytes:

PyObject header:  8 bytes (ob_refcnt)
                  8 bytes (ob_type pointer)
ob_digit:         4 bytes (the actual integer value for small ints)
                + 8 bytes alignment padding
Total:           28 bytes

For a list of one million integers:

# Python list of 1M integers: 8 bytes/pointer × 1M + 28 bytes/int × 1M
python_list = list(range(1_000_000))
total_size = sys.getsizeof(python_list)
for item in python_list:
    total_size += sys.getsizeof(item)
print(f"Python list of 1M ints: {total_size / 1024 / 1024:.1f} MB")

# NumPy array of 1M int64: 8 bytes/element × 1M
import numpy as np
numpy_array = np.arange(1_000_000, dtype=np.int64)
print(f"NumPy array of 1M int64: {numpy_array.nbytes / 1024 / 1024:.1f} MB")

Python list of 1M ints:   28.6 MB
NumPy array of 1M int64:   7.6 MB   (3.8x smaller)

Small Integer Caching

CPython caches integers from -5 to 256. These are singletons - there is only one int(0) object in the entire interpreter, and all code that uses 0 gets a reference to the same object.

a = 100
b = 100
print(a is b)    # True - same object

a = 300
b = 300
print(a is b)    # False - two separate objects (above cache range)

This means the overhead calculation above is partly mitigated for small integers in tight loops. However, for production systems storing millions of user IDs (often > 256), the caching does not help and each integer is a separate heap allocation.

Section 2: `slots` - Eliminating Per-Instance `dict`

By default, every Python instance stores its attributes in a dictionary (__dict__). For a class with 5 attributes, this means paying the 232-byte dict overhead on every single instance.

The Default (With `dict`)

import sys

class UserRecord:
    def __init__(self, user_id: int, name: str, email: str,
                 score: float, active: bool):
        self.user_id = user_id
        self.name = name
        self.email = email
        self.score = score
        self.active = active

u = UserRecord(1, "Alice", "[email protected]", 98.5, True)

# Measure the actual memory cost
base_size = sys.getsizeof(u)
dict_size = sys.getsizeof(u.__dict__)
# Plus the key strings in __dict__ (shared across instances due to key interning)
print(f"Object base:        {base_size} bytes")
print(f"Instance __dict__:  {dict_size} bytes")
print(f"Total (approx):     {base_size + dict_size} bytes")

Object base:        48 bytes
Instance __dict__:  232 bytes
Total (approx):     280 bytes

That is 280 bytes before counting the actual attribute values.

With `slots`

class UserRecordSlots:
    __slots__ = ('user_id', 'name', 'email', 'score', 'active')

    def __init__(self, user_id: int, name: str, email: str,
                 score: float, active: bool):
        self.user_id = user_id
        self.name = name
        self.email = email
        self.score = score
        self.active = active

u = UserRecordSlots(1, "Alice", "[email protected]", 98.5, True)

print(f"Object with __slots__: {sys.getsizeof(u)} bytes")
print(f"Has __dict__: {hasattr(u, '__dict__')}")

Object with __slots__: 104 bytes
Has __dict__: False

104 bytes vs 280 bytes. That is a 2.7x reduction for this example.

For 1 million instances:

import tracemalloc

tracemalloc.start()
users_without_slots = [
    UserRecord(i, f"user{i}", f"user{i}@example.com", float(i), True)
    for i in range(1_000_000)
]
snap1 = tracemalloc.take_snapshot()
tracemalloc.stop()

tracemalloc.start()
users_with_slots = [
    UserRecordSlots(i, f"user{i}", f"user{i}@example.com", float(i), True)
    for i in range(1_000_000)
]
snap2 = tracemalloc.take_snapshot()
tracemalloc.stop()

print(f"Without __slots__: {snap1.statistics('lineno')[0].size / 1024 / 1024:.1f} MB")
print(f"With __slots__:    {snap2.statistics('lineno')[0].size / 1024 / 1024:.1f} MB")

Without __slots__: 312.4 MB
With __slots__:    112.7 MB

Python 3.10+ `@dataclass(slots=True)`

from dataclasses import dataclass

@dataclass(slots=True, frozen=True)   # slots=True requires Python 3.10+
class UserRecord:
    user_id: int
    name: str
    email: str
    score: float
    active: bool

u = UserRecord(1, "Alice", "[email protected]", 98.5, True)
print(sys.getsizeof(u))   # 104 bytes

frozen=True also makes instances hashable (usable as dict keys) - useful for sets and as cache keys.

`slots` Caveats

Behaviour	With `__dict__`	With `__slots__`
Add arbitrary new attributes	Yes	No (`AttributeError`)
Pickle/unpickle	Yes	Requires `__getstate__`
Multiple inheritance	Works	Complex - use carefully
`weakref` support	Yes	Add `'__weakref__'` to slots
Works with `__init_subclass__`	Yes	Yes

Section 3: Generators vs Lists

The most common Python memory antipattern: materialising a large sequence into a list when you only need to iterate over it once.

List vs Generator

import tracemalloc

# BAD: materialises 1M integers into a list, then iterates
tracemalloc.start()
total = sum([x * x for x in range(1_000_000)])   # list comprehension
snap = tracemalloc.take_snapshot()
print(f"List sum: {snap.statistics('lineno')[0].size / 1024 / 1024:.1f} MB peak")
tracemalloc.stop()

# GOOD: generator - never materialises the full sequence
tracemalloc.start()
total = sum(x * x for x in range(1_000_000))     # generator expression
snap = tracemalloc.take_snapshot()
print(f"Generator sum: {snap.statistics('lineno')[0].size / 1024:.1f} KB peak")
tracemalloc.stop()

List sum:      7.6 MB peak
Generator sum: 0.1 KB peak

The generator yields one value at a time, processes it, and discards it. Peak memory is independent of the sequence length.

`yield from` for Generator Chaining

from typing import Generator, Iterable

def read_log_chunks(
    filepath: str,
    chunk_size: int = 8192,
) -> Generator[str, None, None]:
    """Read a large log file in chunks - never loads whole file."""
    with open(filepath, 'r') as f:
        while True:
            chunk = f.read(chunk_size)
            if not chunk:
                break
            yield chunk


def parse_log_lines(chunks: Iterable[str]) -> Generator[dict, None, None]:
    """Parse log lines from chunks - pipeline continues lazily."""
    buffer = ""
    for chunk in chunks:
        buffer += chunk
        lines = buffer.split('\n')
        buffer = lines[-1]   # incomplete last line

        for line in lines[:-1]:
            if line.strip():
                yield parse_single_line(line)

    # Flush remaining
    if buffer.strip():
        yield parse_single_line(buffer)


def filter_errors(records: Iterable[dict]) -> Generator[dict, None, None]:
    """Filter to only error-level records."""
    for record in records:
        if record.get('level') == 'ERROR':
            yield record


def process_large_log(filepath: str) -> int:
    """
    Process a 10GB log file in constant memory using generator pipeline.
    Memory usage: O(chunk_size) - independent of file size.
    """
    chunks = read_log_chunks(filepath, chunk_size=65536)
    records = parse_log_lines(chunks)
    errors = filter_errors(records)

    error_count = 0
    for error_record in errors:
        handle_error(error_record)
        error_count += 1

    return error_count

`itertools` for Memory-Efficient Operations

import itertools

# Chaining large iterables without materialisation
def process_all_shards(shard_paths: list[str]):
    """Process multiple file shards as one lazy stream."""
    all_records = itertools.chain.from_iterable(
        parse_log_lines(read_log_chunks(path))
        for path in shard_paths
    )
    for record in all_records:
        process(record)

# Batching a generator into fixed-size chunks
def batched(iterable, n: int):
    """Yield successive n-sized batches from iterable."""
    it = iter(iterable)
    while True:
        batch = list(itertools.islice(it, n))
        if not batch:
            break
        yield batch

# Process 100M items in batches of 1000 - O(1000) peak memory
for batch in batched(large_generator(), n=1000):
    bulk_insert_db(batch)

Section 4: `array` Module and NumPy Arrays

Python lists can hold objects of mixed types. This flexibility costs memory - each element is a pointer to a heap-allocated PyObject. For homogeneous numerical data, array.array or NumPy arrays are the correct choice.

`array.array` vs Python List

import array
import sys
import numpy as np

n = 1_000_000

# Python list of floats
py_list = [float(i) for i in range(n)]
print(f"Python list:   {sys.getsizeof(py_list) // 1024 / 1024:.1f} MB overhead")
# Plus ~24 bytes per float object = ~24 MB for the float objects themselves

# array.array of doubles (C double = 8 bytes each)
arr = array.array('d', range(n))
print(f"array.array:   {arr.buffer_info()[1] * arr.itemsize // 1024} KB")

# NumPy array of float64
np_arr = np.arange(n, dtype=np.float64)
print(f"NumPy array:   {np_arr.nbytes // 1024} KB")

Python list:   8.0 MB (just the list object - floats add ~24 MB more)
array.array:   7,812 KB  (~7.6 MB)
NumPy array:   7,812 KB  (~7.6 MB)

For 1M floats: Python list ≈ 32 MB total, array.array ≈ 7.6 MB - a 4x reduction.

Type Codes for `array.array`

Code	C Type	Bytes
`'b'`	signed char	1
`'B'`	unsigned char	1
`'h'`	signed short	2
`'H'`	unsigned short	2
`'i'`	signed int	2–4
`'I'`	unsigned int	2–4
`'l'`	signed long	4–8
`'L'`	unsigned long	4–8
`'q'`	signed long long	8
`'Q'`	unsigned long long	8
`'f'`	float	4
`'d'`	double	8

NumPy Structured Arrays - Production-Grade Records

For heterogeneous records (like a database table in memory), NumPy structured arrays pack data into a contiguous C struct layout:

import numpy as np

# Define the schema
dtype = np.dtype([
    ('user_id',    np.int64),     # 8 bytes
    ('score',      np.float32),   # 4 bytes
    ('active',     np.bool_),     # 1 byte
    ('category',   np.uint8),     # 1 byte
    # 2 bytes padding for alignment
])

# Create 1M records
records = np.zeros(1_000_000, dtype=dtype)
records['user_id'] = np.arange(1_000_000)
records['score'] = np.random.rand(1_000_000).astype(np.float32)
records['active'] = True

print(f"Structured array: {records.nbytes / 1024 / 1024:.1f} MB")
# 1M × 16 bytes = 16 MB

# Column access is O(1) - NumPy returns a view, no copy
scores = records['score']   # view into the structured array

# Filtering returns a compact subset
active_users = records[records['active']]
print(f"Active users slice: {active_users.nbytes / 1024 / 1024:.1f} MB")

Structured array:     16.0 MB

Compare to 1M Python UserRecord objects: ~312 MB. The structured array is 19x smaller.

Section 5: Memory-Mapped Files (`mmap`)

mmap maps a file on disk directly into the process's virtual address space. Reading a slice of the mapping causes the OS to load only that page from disk - not the entire file. This is the standard approach for files too large to fit in RAM.

import mmap
import os

def process_large_csv_mmap(filepath: str) -> dict:
    """
    Process a large CSV without loading it entirely into memory.
    Memory usage: O(page_size) regardless of file size.
    """
    file_size = os.path.getsize(filepath)
    print(f"File size: {file_size / 1024 / 1024 / 1024:.2f} GB")

    counts = {'rows': 0, 'errors': 0}

    with open(filepath, 'r') as f:
        with mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ) as mm:
            # Find the header
            first_newline = mm.find(b'\n')
            header = mm[:first_newline].decode('utf-8')
            columns = header.split(',')
            print(f"Columns: {columns}")

            # Iterate line by line without loading the whole file
            mm.seek(first_newline + 1)
            while True:
                line_start = mm.tell()
                line_end = mm.find(b'\n', line_start)

                if line_end == -1:
                    break

                line = mm[line_start:line_end].decode('utf-8')
                mm.seek(line_end + 1)

                fields = line.split(',')
                if len(fields) != len(columns):
                    counts['errors'] += 1
                    continue

                counts['rows'] += 1

    return counts

Searching a Large File Without Loading It

import mmap
import re

def find_pattern_in_large_file(filepath: str, pattern: bytes) -> list[int]:
    """
    Find all byte offsets of a pattern in a large file.
    Works on files larger than RAM.
    Returns byte offsets.
    """
    offsets = []
    with open(filepath, 'rb') as f:
        with mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ) as mm:
            offset = 0
            while True:
                found = mm.find(pattern, offset)
                if found == -1:
                    break
                offsets.append(found)
                offset = found + 1
    return offsets


def extract_sections_mmap(filepath: str, start_bytes: list[int],
                           section_size: int) -> list[bytes]:
    """
    Extract fixed-size sections from specific offsets in a large binary file.
    Each extraction triggers only one page load from disk.
    """
    sections = []
    with open(filepath, 'rb') as f:
        with mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ) as mm:
            for offset in start_bytes:
                sections.append(bytes(mm[offset:offset + section_size]))
    return sections

Writing with `mmap`

import mmap

def write_results_mmap(filepath: str, results: list[float]) -> None:
    """
    Write results to a pre-allocated file using mmap.
    Faster than sequential writes for random-access patterns.
    """
    import struct

    # Pre-allocate the file
    n = len(results)
    record_size = 8  # double = 8 bytes
    total_size = n * record_size

    with open(filepath, 'w+b') as f:
        f.seek(total_size - 1)
        f.write(b'\0')
        f.seek(0)

        with mmap.mmap(f.fileno(), total_size) as mm:
            for i, val in enumerate(results):
                mm[i * record_size:(i + 1) * record_size] = struct.pack('d', val)

Section 6: Garbage Collector Tuning

Python uses reference counting as its primary memory management. When a reference count drops to zero, the object is immediately freed. The cyclic garbage collector (GC) handles the case that reference counting cannot: objects that reference each other in cycles.

import gc

# Show current GC settings
print(gc.get_threshold())   # (700, 10, 10) - thresholds for each generation

# Force a full collection manually
gc.collect()

# Disable GC for a throughput-critical section
gc.disable()
try:
    # Process a large batch - no GC pauses
    results = [compute(item) for item in large_batch]
finally:
    gc.enable()
    gc.collect()   # collect manually after the batch

The GC Pause Problem

Python's GC runs periodically based on allocation counts. When it runs, it can pause the entire program for 10–100ms. In a latency-sensitive service (e.g., real-time trading, gaming, video streaming), these pauses are unacceptable.

# Demonstrate GC pause
import gc
import time
import random

# Create many objects with circular references to force GC work
gc.disable()

# Build up garbage
objects = []
for i in range(100_000):
    a = {'id': i}
    b = {'ref': a}
    a['ref'] = b   # circular reference
    objects.append(a)

del objects   # objects are now unreachable but reference-counting can't free them

# Measure GC collection time
start = time.perf_counter()
gc.collect()
elapsed = (time.perf_counter() - start) * 1000
print(f"GC collection took: {elapsed:.1f}ms")

gc.enable()

GC collection took: 45.3ms

Mitigation Strategies

Strategy 1: Disable GC for batch operations

import gc
from contextlib import contextmanager

@contextmanager
def no_gc():
    """Disable GC for a block, collect manually at exit."""
    gc.disable()
    try:
        yield
    finally:
        gc.enable()
        gc.collect()

# Use in throughput-critical sections
with no_gc():
    results = process_batch(large_input)

Strategy 2: Tune GC thresholds

# Default: collect after 700 gen0, 10 gen1, 10 gen2 allocations
gc.get_threshold()  # (700, 10, 10)

# Increase thresholds to reduce GC frequency at cost of more memory
gc.set_threshold(10_000, 20, 20)

# Or disable cyclic GC entirely if you are careful about circular references
# (appropriate for services that use only acyclic data structures)
gc.disable()

Strategy 3: Schedule GC outside of request handling

import gc
import asyncio

async def periodic_gc(interval_seconds: float = 30.0) -> None:
    """Run GC during low-traffic windows, not during request handling."""
    while True:
        await asyncio.sleep(interval_seconds)
        gc.collect()

# In FastAPI lifespan:
@asynccontextmanager
async def lifespan(app: FastAPI):
    gc.disable()
    task = asyncio.create_task(periodic_gc(interval_seconds=60.0))
    yield
    task.cancel()
    gc.enable()
    gc.collect()

Section 7: `weakref` - Caches That Don't Leak

A common memory leak pattern: a cache holds a strong reference to objects, preventing them from being garbage collected even after all other references are gone.

# BAD: cache holds strong references
_document_cache = {}

def get_document(doc_id: str) -> Document:
    if doc_id not in _document_cache:
        _document_cache[doc_id] = load_from_db(doc_id)
    return _document_cache[doc_id]
# Problem: _document_cache grows forever - it holds the only remaining
# reference to documents once callers are done with them.

weakref allows you to reference an object without preventing it from being garbage collected. If the object has no other references, it is collected, and the weak reference becomes None.

import weakref

class Document:
    def __init__(self, doc_id: str, content: str):
        self.doc_id = doc_id
        self.content = content

# WeakValueDictionary: values are weak references
# When a Document has no other strong references, it is removed from the dict
_document_cache: weakref.WeakValueDictionary[str, Document] = \
    weakref.WeakValueDictionary()

def get_document(doc_id: str) -> Document:
    doc = _document_cache.get(doc_id)
    if doc is None:
        doc = load_document_from_db(doc_id)
        _document_cache[doc_id] = doc   # weakly cached
    return doc

# Callers keep a strong reference while they need the document
doc = get_document("doc-123")
# ... use doc ...
# When doc goes out of scope (or caller deletes it),
# the cache entry is automatically removed
del doc
# _document_cache["doc-123"] is now gone - GC reclaimed the object

`weakref.ref` for Individual References

import weakref

class HeavyProcessor:
    """A large object we want to cache but not keep alive permanently."""
    def __init__(self, config: dict):
        self.config = config
        self.data = [0] * 1_000_000  # expensive to create

_processor_cache = None
_processor_ref = None

def get_processor(config: dict) -> HeavyProcessor:
    global _processor_cache, _processor_ref

    # Check if the weak reference is still valid
    if _processor_ref is not None:
        processor = _processor_ref()   # dereference the weak ref
        if processor is not None:
            return processor

    # Create a new processor
    processor = HeavyProcessor(config)
    _processor_ref = weakref.ref(processor)
    return processor

Callbacks on Object Deletion

import weakref

def on_document_gc(ref):
    """Called when the document is garbage collected."""
    print(f"Document {ref} was garbage collected - cleaning up related state")

doc = Document("doc-456", "content here")
weak_doc = weakref.ref(doc, on_document_gc)

del doc   # triggers: "Document <weakref...> was garbage collected"

Section 8: `objgraph` - Finding Memory Leaks in Production

objgraph shows you what types of objects are alive in the Python heap and how they are connected. It is the primary tool for finding leaks in long-running processes.

pip install objgraph

Finding the Most Common Object Types

import objgraph

# Show the 20 most common object types in the heap
objgraph.show_most_common_types(limit=20)

dict                   45823
list                   38291
tuple                  21948
function               8834
type                   2341
UserRecord             1009042   ← suspiciously high
weakref                823
...

If UserRecord is at 1M and should be at 10K, you have a leak.

Tracking Growth Between Snapshots

import objgraph
import gc

# Take a baseline snapshot
gc.collect()
objgraph.show_growth(limit=10)   # prints nothing - sets baseline

# Simulate some production traffic
for _ in range(100):
    simulate_request()

gc.collect()

# Show what grew since baseline
objgraph.show_growth(limit=10)

UserRecord         +992 [1009 total]
dict               +128 [46823 total]
list               +45  [38336 total]

UserRecord is growing by ~10 per request and not being freed. That is your leak.

Finding What's Keeping an Object Alive

import objgraph

# Find all UserRecord objects
records = objgraph.by_type('UserRecord')
suspect = records[0]   # take the first one

# Show what is referencing this object (why it's not GC'd)
objgraph.show_backrefs(
    suspect,
    max_depth=5,
    filename='backrefs.png',   # requires graphviz
)

The backreference graph shows the chain of references keeping the object alive. Common findings:

A module-level list that is never cleared
A callback registered in a global event system
A class variable that accumulates instances
A closure that captures and retains the object

Finding a Leaked Closure

import objgraph

# A pattern that causes closures to accumulate
class EventBus:
    _handlers = []   # module-level - never GC'd

    @classmethod
    def subscribe(cls, handler):
        cls._handlers.append(handler)  # strong reference to closure

def handle_request(user_id: int):
    def on_complete(result):
        log(f"User {user_id}: {result}")   # closure over user_id

    EventBus.subscribe(on_complete)   # BUG: handler never unsubscribed
    process(user_id)

# After 1000 requests:
objgraph.show_most_common_types()
# function    1000+   ← leaked closures

Interview Questions

Q1: sys.getsizeof([]) returns 56. Does an empty list really only use 56 bytes? Explain what sys.getsizeof measures and what it misses.

sys.getsizeof returns the size of the object itself - the list struct in CPython - which for an empty list is 56 bytes. This includes the object header (16 bytes), the array of pointers (initially 0 items but with some pre-allocated capacity), and the ob_size field.

What it misses:

The objects the list contains (each item is a pointer to a separate heap-allocated PyObject - sys.getsizeof does not follow these pointers)
Memory allocator overhead (CPython's small object allocator rounds up to 8-byte boundaries and maintains free lists)
The virtual memory page overhead from the OS allocator

For a list of 100 integers, sys.getsizeof reports 856 bytes (56 + 8 bytes per pointer slot). The actual memory consumption including the integer objects is 856 + 100 × 28 = 3,656 bytes. For a full accounting, use tracemalloc or walk the object graph recursively.

Q2: What is __slots__ and what are its tradeoffs? When would you NOT use it?

__slots__ is a class variable that declares a fixed set of instance attributes. When present, Python allocates a fixed C-level struct for the instance attributes instead of a per-instance __dict__. This eliminates the 232-byte dict overhead and replaces it with a few pointers - reducing per-instance memory by 2–3x for typical small data classes.

Tradeoffs:

You cannot add arbitrary attributes at runtime. user.nickname = "Bob" raises AttributeError if nickname is not in __slots__. This is often a feature (prevents typos in attribute names) but can break code that relies on dynamic attribute setting.

Pickling requires extra work. The default __reduce__ implementation does not handle __slots__ - you need to implement __getstate__ and __setstate__.

Multiple inheritance is complex. If two parent classes both have __slots__, there are layout compatibility constraints. Often simpler to not use __slots__ in class hierarchies with multiple inheritance.

weakref support is not automatic. If you need weak references to instances, include '__weakref__' in __slots__.

When not to use __slots__:

The class has very few instances (overhead is irrelevant)
The class participates in complex multiple inheritance
The codebase frequently adds dynamic attributes to instances
You need to pickle instances and do not want to implement state methods

Q3: What is the difference between a list comprehension and a generator expression in terms of memory? When should you use each?

A list comprehension [f(x) for x in iterable] immediately evaluates the entire expression and builds a list object in memory containing all results. Memory usage is O(n) where n is the number of elements.

A generator expression (f(x) for x in iterable) creates a generator object that computes one element at a time, on demand. Memory usage is O(1) - only one element exists at a time, plus the generator's frame state.

Use a list comprehension when:

You need to access elements by index (result[5])
You need to iterate over the results multiple times
You need len() on the results
The downstream code requires a list explicitly

Use a generator expression when:

You iterate over the results exactly once
The sequence may be very long (or infinite)
You are feeding a function that accepts an iterable (sum(), max(), sorted(), join(), etc.)
You are building a pipeline of transformations

The subtle rule: if you are passing the result immediately to a function that consumes an iterable, the generator is almost always correct - sum(x*x for x in range(1M)) never allocates the 1M-element list.

Q4: Explain the Python garbage collector and GC pauses. How would you reduce GC pauses in a low-latency trading service?

Python uses reference counting as its primary memory management: when an object's reference count reaches zero, it is immediately freed. Reference counting cannot handle cycles (A references B, B references A - both counts stay at 1 even when unreachable). The cyclic GC handles this case by periodically scanning for unreachable cycles and freeing them.

The GC organises objects into three generations (0, 1, 2). Generation 0 is collected most frequently (triggered after 700 new allocations by default). Older objects are promoted to later generations and collected less often. A full generation-2 collection can take 10–100ms for a heap with many live objects.

For a low-latency trading service:

1. Eliminate cycles in data structures. The cyclic GC only needs to run if cycles exist. If your hot-path data structures are acyclic (trees, lists without back-references, simple dicts), you can safely disable the cyclic GC entirely with gc.disable().

2. Disable GC during market-hours computation and run it during off-peak windows. gc.disable() before market open, schedule gc.collect() calls during end-of-day processing or overnight.

3. Use __slots__ and value types (NumPy arrays) instead of Python objects in hot paths. Fewer Python objects = fewer objects for the GC to scan.

4. Increase GC thresholds with gc.set_threshold(50000, 20, 20) to reduce collection frequency at the cost of slightly higher peak memory.

5. Monitor GC pauses using gc.callbacks to log collection duration and alert when pauses exceed latency SLAs.

Q5: What is a memory-mapped file and when is it preferable to reading a file with read()?

A memory-mapped file maps file contents directly into the process's virtual address space. When you access a slice of the mapping (e.g., mm[100:200]), the OS loads only the page (typically 4KB) containing those bytes from disk. The rest of the file is not loaded until accessed. The OS manages the page cache - frequently accessed pages stay in memory, cold pages are evicted.

mmap is preferable to read() when:

The file is larger than available RAM. f.read() loads the entire file at once. mmap loads only the pages you access - the entire file is "available" via the mapping but only accessed pages consume RAM. A 100GB file on a 16GB machine: f.read() fails; mmap works.

You need random access into a large file. With read() you must read sequentially (or seek + read for each access). With mmap, mm[offset:offset+size] is a direct memory access - the OS translates it to a disk read if needed, with no Python buffering overhead.

Multiple processes access the same file. The OS shares memory-mapped pages across processes - two processes mapping the same file see the same physical memory pages. One copy in RAM serves both processes.

For large binary format files (NumPy .npy, HDF5 internals, Protocol Buffer files), mmap allows processing subsets without loading the whole structure.

read() is preferable when the file is small (fits easily in memory), you always process the entire file linearly, or you need Python string methods that mmap does not natively provide.

The 288-Byte Empty Dictionary​

What You Will Learn​

Prerequisites​

Section 1: Python Object Memory Model​

The PyObject Header​

Measuring Real Object Sizes​

Why a Python int Is 28 Bytes​

Small Integer Caching​

Section 2: __slots__ - Eliminating Per-Instance __dict__​

The Default (With __dict__)​

With __slots__​

Python 3.10+ @dataclass(slots=True)​

__slots__ Caveats​

Section 3: Generators vs Lists​

List vs Generator​

yield from for Generator Chaining​

itertools for Memory-Efficient Operations​

Section 4: array Module and NumPy Arrays​

array.array vs Python List​

Type Codes for array.array​

NumPy Structured Arrays - Production-Grade Records​

Section 5: Memory-Mapped Files (mmap)​

Searching a Large File Without Loading It​

Writing with mmap​

Section 6: Garbage Collector Tuning​

The GC Pause Problem​

Mitigation Strategies​

Section 7: weakref - Caches That Don't Leak​

weakref.ref for Individual References​

Callbacks on Object Deletion​

Section 8: objgraph - Finding Memory Leaks in Production​

Finding the Most Common Object Types​

Tracking Growth Between Snapshots​

Finding What's Keeping an Object Alive​

Finding a Leaked Closure​

Interview Questions​